Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Author name recognition in degraded journal images

Identifieur interne : 001181 ( Main/Exploration ); précédent : 001180; suivant : 001182

Author name recognition in degraded journal images

Auteurs : Aliette De Bodard De La Jacopiere [France] ; Laurence Likforman-Sulem [France]

Source :

RBID : Pascal:07-0376470

Descripteurs français

English descriptors

Abstract

A method for extracting names in degraded documents is presented in this article. The documents targeted are images of photocopied scientific journals from various scientific domains. Due to the degradation, there is poor OCR recognition, and pieces of other articles appear on the sides of the image. The proposed approach relies on the combination of a low-level textual analysis and an image-based analysis. The textual analysis extracts robust typographic features, while the image analysis selects image regions of interest through anchor components. We report results on the University of Washington benchmark database.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Author name recognition in degraded journal images</title>
<author>
<name sortKey="De Bodard De La Jacopiere, Aliette" sort="De Bodard De La Jacopiere, Aliette" uniqKey="De Bodard De La Jacopiere A" first="Aliette" last="De Bodard De La Jacopiere">Aliette De Bodard De La Jacopiere</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>GET-Ecole Nationale Supérieure des Télécommunications Signal and Image Processing Department, 46 rue Barrault</s1>
<s2>75013 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Likforman Sulem, Laurence" sort="Likforman Sulem, Laurence" uniqKey="Likforman Sulem L" first="Laurence" last="Likforman-Sulem">Laurence Likforman-Sulem</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>GET-Ecole Nationale Supérieure des Télécommunications Signal and Image Processing Department, 46 rue Barrault</s1>
<s2>75013 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">07-0376470</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 07-0376470 INIST</idno>
<idno type="RBID">Pascal:07-0376470</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000336</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000450</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000351</idno>
<idno type="wicri:doubleKey">0277-786X:2006:De Bodard De La Jacopiere A:author:name:recognition</idno>
<idno type="wicri:Area/Main/Merge">001212</idno>
<idno type="wicri:Area/Main/Curation">001181</idno>
<idno type="wicri:Area/Main/Exploration">001181</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Author name recognition in degraded journal images</title>
<author>
<name sortKey="De Bodard De La Jacopiere, Aliette" sort="De Bodard De La Jacopiere, Aliette" uniqKey="De Bodard De La Jacopiere A" first="Aliette" last="De Bodard De La Jacopiere">Aliette De Bodard De La Jacopiere</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>GET-Ecole Nationale Supérieure des Télécommunications Signal and Image Processing Department, 46 rue Barrault</s1>
<s2>75013 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Likforman Sulem, Laurence" sort="Likforman Sulem, Laurence" uniqKey="Likforman Sulem L" first="Laurence" last="Likforman-Sulem">Laurence Likforman-Sulem</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>GET-Ecole Nationale Supérieure des Télécommunications Signal and Image Processing Department, 46 rue Barrault</s1>
<s2>75013 Paris</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Île-de-France</region>
<settlement type="city">Paris</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Database</term>
<term>Degradation</term>
<term>Image analysis</term>
<term>Interest region</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse image</term>
<term>Dégradation</term>
<term>Reconnaissance optique caractère</term>
<term>Région intérêt</term>
<term>Base donnée</term>
<term>Reconnaissance forme</term>
<term>4230</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Base de données</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A method for extracting names in degraded documents is presented in this article. The documents targeted are images of photocopied scientific journals from various scientific domains. Due to the degradation, there is poor OCR recognition, and pieces of other articles appear on the sides of the image. The proposed approach relies on the combination of a low-level textual analysis and an image-based analysis. The textual analysis extracts robust typographic features, while the image analysis selects image regions of interest through anchor components. We report results on the University of Washington benchmark database.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Île-de-France</li>
</region>
<settlement>
<li>Paris</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Île-de-France">
<name sortKey="De Bodard De La Jacopiere, Aliette" sort="De Bodard De La Jacopiere, Aliette" uniqKey="De Bodard De La Jacopiere A" first="Aliette" last="De Bodard De La Jacopiere">Aliette De Bodard De La Jacopiere</name>
</region>
<name sortKey="Likforman Sulem, Laurence" sort="Likforman Sulem, Laurence" uniqKey="Likforman Sulem L" first="Laurence" last="Likforman-Sulem">Laurence Likforman-Sulem</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001181 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001181 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:07-0376470
   |texte=   Author name recognition in degraded journal images
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024